Estimation of Bacterial Densities by Means of the "Most Probable Number" 

William G. Cochran 



Biometrics, Vol. 6, No. 2. (Jun., 1950), pp. 105-1 16. 
Stable URL: 

http;//linkS.istor,0rg / s ici?sici=0O 0 6 -34 1 X%28 1 95006%2Q6%3 A2%3C 1 0W„S AEOBDBM%3F,2.0.CO%3B?.-K 
Biometrics is currently published by International Biometric Society. 



Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at 

http://wvywj.stor.orR/about/term.shtml . JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained 

P u n0 L P ^ 1SS10n ' y ° U may n0t download an entire issue of a journal or multiple copies of articles, and you may use content in 
the JSTOR archive only for your personal, non-commercial use. 

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at 
http://www.istor.org/jniimak/ihs html 

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed 
page of such transmission. v 



JSTOR is an independent not-for-profit organization dedicated to and preserving a digital archive of scholarly journals For 
more information regarding JSTOR, please contact support@jstor.org. 



http://www.jstor.org 
ThuFeb 1 17:03:05 2007 



ESTIMATION OF BACTERIAL DENSITIES BY MEANS OF 
THE "MOST PROBABLE NUMBER"* 

William G. Cochkan 
School of Hygiene and Public Health 
The Johns Hopkins University 

Presented at a joint session of the Engineering, Laboratory and Statistics 
Sections of the American Public Health Association, Biometrics Section of the 
American Statistical Association, and the Biometric Society, New York 
October 27, 1940. 



INTRODUCTION 

rwims paper attempts to give a simple account of the concept of the 
-■•"most probable number" (m.p.n.) of organisms in the dilution 
method. The concept is quite old, going back to McCrady (4) in 1915, 
and has been discussed by various writers from time to time, so that 
little of what I shall present is new. In addition, some advice is given on 
the planning of dilution series. 

The dilution method is a means for estimating, without any direct 
count, the density of organisms in a liquid. It is used principally for 
obtaining bacterial densities in water and milk. The method consists 
in taking samples from the liquid, incubating each sample in a suitable 
culture medium, and observing whether any growth of the organism 
has taken place. The estimation of density is based on an ingenious 
application of the theory of probability to certain assumptions. For a 
biologist, it is more important to be clear about these assumptions than 
about the details of the mathematics, which are rather intricate. 

ASSUMPTIONS 

There are two principal assumptions. In statistical language, the 
first is that the organisms are distributed randomly throughout the liquid. 
This means that an organism is equally likely to be found in any part of 
the liquid, and that there is no tendency for pairs or groups of organisms 
either to cluster together or to repel one another. In practice this implies 
that the liquid is thoroughly mixed, and if the volume of liquid is not too 
great some shaking device is usually employed for this purpose. 



•Paper 254 from the Department of Biogtatiatic*. 
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The second assumption is that each sample from the liquid, when 
incubated in the culture medium, is certain to exhibit growth whenever 
the sample contains one or more organisms. If the culture medium is 
poor, or if there are factors which inhibit growth, or if the presence of 
more than one organism is necessary to initiate growth, the m.p.n. gives 
an underestimate of the true density. 

MATHEMATICAL ANALYSIS 

In the mathematical analysis we relate the probability that there will 
be no growth in a sample to the density of organisms in the original liquid. 
Suppose that the liquid contains V ml., the sample contains v ml., and 
that there are actually b organisms in the liquid. By the second assump- 
tion, there will be no growth if and only if the sample contains no organ- 
isms. We will calculate the probability that none of these b organisms is 
in the sample. 

Consider a single organism. By the first assumption, the probability 
that it lies in the sample is simply the ratio of the volume of the sample 
to that of the liquid, i.e. v/V. The probability that it is not in the sample 
is therefore (1 - v/V). Since there is assumed to be no kind of attract- 
tion or repulsion between organisms, these two probabilities hold for 
any organism, irrespective of the positions of the other organisms. 
(Strictly, this requires the additional assumption that the space occupied 
by an organism is negligible relative to v.) Consequently, by the multi- 
plication theorem in probability, the probability that none of the 6 
organisms is in the sample is 

p «, (i - v/V)". 

When v/V is small, this is closely approximated by 



where e, about 2.7, is the base of natural logarithms. Finally, since 
b/V is the density S of organisms per ml., we have 



where p is the probability that the sample is sterile. 

THE CASE OF A SINGLE DILUTION 

If n samples, each of volume v, are taken, and if s of these are found 
to be sterile, the proportion s/n of sterile samples is an estimate of p. 
Hence we obtain an estimate d of the density 5 by the equation 
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This gives 



d = 



V 



In 




2.303 



v 




(1) 



where In and log stand for logarithms to base e and to base 10 re- 
spectively. 

The estimate d is the "most probable number" of organisms per ml. 
The derivation given here does not reveal why this name has been 
ascribed to the estimate. In fact, the concept of m.p.n. is scarcely needed 
for this simple case. We will, however, reexamine the analysis so as to 
introduce the concept, which becomes useful in the more complex situa- 
tion where several dilutions are used. 

If p is the probability that a sample is sterile, the probability that s 
out of n samples are sterile is given by the binomial distribution as 



If we have obtained s sterile samples out of n, this formula enables us to 
plot the probability of this event against the true density S. Such curves 
always have a single maximum. 

A curve of this type suggests a method for estimating S which is 
plausible on intuitive grounds. For if we are considering two possible 
values of S, it seems reasonable to prefer the one which gives a higher 
probability to the result that was actually observed. This argument, 
carried to its conclusion, leads to a choice of the value of S for which 
the probability of obtaining the observed result is greatest. It is this 
value of 5 that has been called the "most probable number" of organisms. 
It can be shown mathematically that this is the value of S for which 
p = s/n. Consequently the m.p.n. is the same as the estimate previously 
given. 

In practice, more than one dilution is usually needed. The reason is 
that the precision of the m.p.n. is very poor when the volume v in the 
sample is such that the samples are likely to be all fertile or all sterile. 
When all are fertile, the maximum on the probability curve (3) occurs 
when S is infinite, so that the estimated density is infinite. When all are 



IV i 

s!(n — s)! 



p*(l - p) 



(2) 



Since p = e ,J , this expression may be written 



- S \(n- S )l e (1 ~ e > 



(3) 
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sterile the estimated density is zero, as may also be verified from equation 
(1). Thus a single dilution is successful only if v happens to be chosen 
so that some samples are sterile and some are fertile. Such a choice of 
v can be made only if the density 5 is known fairly closely in advance. 

If we possess this knowledge, it is best to select v so that the expected 
number of organisms per sample lies somewhere between 1 and 2. For 
this choice the expected percentage of sterile samples will lie between 
15% and 35%. In default of this knowledge, the practice is to use 
several dilutions (i.e. several different values of v) in the hope that at 
least one of them will give some sterile and some fertile samples. 



The case of three dilutions serves to illustrate the general problem. 
Let the suffix i indicate the dilution. For the ith dilution the volume of 
the sample is v { , and s, out of n,- samples are found to be sterile. How 
do we estimate 5 from these results? 

From equation (1) we can obtain a separate estimate for each dilu- 
tion: i.e. 



However, the best way to combine the three estimates into a single 
value is not obvious. Since, as we have seen, some dilutions give very 
poor estimates, it is not satisfactory to take the arithmetic mean. 

One solution is provided by the m.p.n. concept, which extends easily 
to this situation. Following the approach used in the previous section, 
we first write down the probability of obtaining the observed results for 
any hypothetical value of the true density S. The observed results are 
that s, samples out of n x are sterile at the first dilution, s 2 out of n 2 at 
the second, and s 3 out of n 3 at the thirds The probability that these 
three events should all happen is the product of three terms, each like 
expression (3) in the previous section. As before, the graph of this 
probability against 5 shows a single maximum. The value of 8 at this 
maximum is taken as the m.p.n. 

The value of the m.p.n. cannot be written down explicitly. The 
equation which it satisfies is as follows. 



THREE DILUTIONS 




SiVi + S S V 2 + S 3 V 3 = 



(n, — s,)v,e 
1 - e-' d 



+ 



(n, — s 2 )t>2e 
1 - «— * 



+ 



(n 3 — s 3 > 3 e 
1 - e— d 



Methods for solving this equation by trial and error have been given by 
several writers: e.g. Halvorson and Ziegler (3), Barkworth and Irwin (1) 
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and Finney (2). In laboratories where the numbers of samples n, and 
the dilution ratios are standardized, it is convenient to have a table which 
gives the m.p.n. for all sets of results that are likely to occur. A table 
is provided in "Standard methods for the examination of water and 
sewage" (5), for dilution series in which 5 samples are taken at each 
dilution and there are three 10-fold dilutions. A more extensive table, 
for dilution ratios of 2, 4, and 10 and any number of levels (except two 
levels with a 10-fold dilution) is given by Fisher and Yates (6). This is 
not a table of the m.p.n., but of a different estimate which seems to be 
just about as precise for series of the size usually conducted in practice. 
This estimate is derived from the total numbers X and Y of fertile and 
sterile samples. The quantities x = X/n, y = Y/n are entered in the 
table, from which an estimate of log d is obtained. 

CRITIQUE OP THE M.P.N. 

We have seen that the m.p.n. is an estimate of the density of organ- 
isms. Considered more generally, it is a procedure for obtaining estimates, 
since the same argument could be applied to other statistical problems. 
The only justification which I have mentioned for the procedure is 
that it seems intuitively reasonable. From a reading of the literature 
I am not certain as to the reasons which led early investigators to select 
this estimate, though either the intuitive approach or an appeal to a 
theory of inverse probability may have been responsible. 

During the past 25 years the problem of making estimates from data 
has received much attention from statisticians. Today, most statis- 
ticians would, I believe, reject an appeal to intuition or to the theory of 
inverse probability as a reliable procedure for constructing estimates, 
since both have been found on occasion to be untrustworthy. They 
might also object to the name "most probable number," on the grounds 
that the adjective "probable" in that phrase has a different meaning 
from the one given to it in the theory of probability. The estimate is 
"most probable" only in the roundabout sense that it gives the highest 
probability to the observed results. But they would not reject the m.p.n. 
procedure itself, which has come to be regarded as a remarkably reliable 
tool of very wide utility. At the risk of a slight digression it is interesting 
to indicate the reasons for the reputation which the method has acquired. 

The modern approach is to appraise any method of estimation by 
results. For the m.p.n. this is done, ideally, by conducting a large num- 
ber of dilution series with given v's and n's, in circumstances where the 
true density is known. For each series the density is estimated by the 
m.p.n., so that we accumulate a large number of observations on the 
amounts by which the m.p.n. is in error. These observations can be 
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summarized conveniently by plotting the frequency distribution of the 
m.p.n. about the true density. If this frequency distribution groups 
very closely about the true density, we know that the estimates are 
usually good. Such a set of experiments would be difficult and expensive 
to conduct, but if we assume that the mathematical analysis which has 
been applied to the dilution method is valid, we can work out the 
frequency distribution by purely mathematical methods. 

As the numbers of samples n, become large, the frequency distribu- 
tion of such an estimate (m.p.n. or other) usually tends to assume a 
certain limiting form— the normal distribution. An important general 
result has been established about these limiting distributions (7), to the 
effect that the limiting distribution of the m.p.n. has the smallest 
standard deviation that can be achieved by any method of estimation. 
Roughly speaking, this means that the m.p.n. gives on the average at 
least as precise estimates as any other method used on the same data. 
There is no point in seeking further for a more precise estimate. The 
theorem cannot be proved in general when the numbers of samples are 
small, but experience suggests that the m.p.n. technique is among the 
best methods of estimation in this case also. Consequently the m.p.n. 
method is now generally used in a great variety of problems of statistical 
estimation, though it more frequently goes by the name of the "method 
of maximum likelihood." 

THE PLANNING OF DILUTION SERIES 

In preparation for an estimation by the dilution method, three de- 
cisions must be made: (i) what range is to be covered: i.e. what are to 
be the highest and lowest sample volumes; (ii) what dilution factor is 
to be used; and (iii) how many samples should be taken for each dilution. 

Specific decisions must depend on a knowledge of the limits within 
which the true density is likely to lie and on the precision desired in the 
estimate. The way in which precision is to be measured needs some 
comment. Suppose that the true density is thought to lie somewhere 
between say 2 and 400 organisms per ml. No matter where the true 
density should happen to be within this range, we want to plan the series 
so that the estimate will have a specified "precision." This might be 
taken to mean that the standard error of the estimated density should be 
say 30 organisms. But this does not seem a reasonable definition of 
"equal precision," because although an estimate of 360 ± 30 organisms 
seems satisfactorily precise, an estimate of 5 ± 30 organisms seems very 
imprecise. Instead, we take "equal precision" to imply that the standard 
error bears a constant ratio to the true density, in other words that the 
coefficient of variation of the estimated density is constant. A further 
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potent reason for adopting this concept is that in a well-designed series 
the m.p.n. estimates do have approximately the property that the coeffi- 
cient of variation is independent of the true density. ' -Thus in a sense we 
are making a virtue of necessity. 

The following remarks are intended as a rough guide in the planning 
of dilution series. They were derived from investigations of the precision 
of the m.p.n. 

HIGHEST AND LOWEST SAMPLE VOLUMES 

These are determined by the range of densities with which we expect 
to have to cope. With a single dilution it was mentioned that for the 
best results the expected number of organisms in the sample volume v 
should lie between 1 and 2. It follows that in a series of dilutions the 
expected number of organisms in the highest sample volume v„ should 
be at least 1, otherwise there is a risk that all samples will be sterile. 
Similarly the expected number of organisms in the lowest sample volume 
v L should not exceed 2, to avoid the risk that all samples will be fertile. 
This line of reasoning would lead to the rule that a dilution series is 
capable of estimating any density that lies between l/v„ and 2/v L . 

This rule is satisfactory if a substantial number of samples, say 
20 or more, are being taken at each dilution. With very small numbers 
of samples per dilution, which are typical in certain lines of work, the 
rule is not quite stringent enough, in that it allows too much risk that 
all samples may be fertile. Suppose that we have three 10-fold dilutions, 
with sample volumes 0.01, 0.1 and 1 ml. This series should be able to 
estimate any true density between 1 and 200 organisms per ml. If, 
however, the density happens to be 200 per ml., so that the expected 
number of organisms per sample in the lowest sample volume is 2, then 
the probability of a sterile sample at this dilution is e~ 2 , or 0.135. The 
probability of a fertile sample is 0.865. If only four samples are used per 
dilution, the probability that all four are fertile is (0.865) 4 , or 0.56. At 
the two higher concentrations, all samples are practically certain to be 
fertile. Thus the worker runs about a 50-50 chance that all his samples 
will be fertile, which usually necessitates repetition of the series. On the 
other hand, with 20 samples per dilution, the probability that all are 
fertile is (0.865) 20 , or only about 0.05. 

Thus in small experiments it is safer to reduce the upper density 
value from 2/v L to l/v L . In practice, we use this rule by first guessing 
two limits 8 L and S H between which we are fairly certain that the true 
density lies. The sample volumes are then chosen to satisfy the rules 
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For example, if we are confident that the density lies between 10 and 750 
per ml., the highest sample volume should be at least 1/10, or 0.1 ml. 
The lowest sample volume should not be more than 1/750 ml The 
three 10-fold dilutions 1/10, 1/100 and 1/1000 ml., or the four 5-fold 
dilutions 1/10, 1/50, 1/250 and 1/1250, would amply cover this range 
of densities. 

THE DILUTION RATIO 

As regards the selection of a dilution ratio, there are two relevant 
results. If the total number of samples in the whole series is kept fixed, 
the average precision is practically the same for any dilution ratio 
between 2 and 10. The advantage of a low dilution ratio, which requires 
more work, is that the precision is more nearly constant throughout the 
range of densities between l/v„ and l/v L . These points may be illus- 
trated by a comparison between the dilution ratios 2 and 10, in series 
designed to cover the same range of densities and to use the same total 
number of samples, 72. The details for the two series are as follows 



Dilution 
ratio 


No. of samples 
per dilution 


Volumes of samples (ml.) 


2 


9 


.01, .02, .04, .08, .16, 


10 




.32, .64, 1.28 


24 


.01, .10, 1.00 



The two series should cover a range of densities from l/v H to l/v L , or 
from about 1 to 100 organisms per ml. The dilution ratio 2 requires 
eight dilutions, with 9 samples per dilution, whereas the dilution ratio 
10 requires only 3 dilutions and allows 24 samples per dilution. 

In Figure 1 the standard error of the m.p.n., expressed as a percent 
of the true density, is plotted against the true density (on a log scale). 
With both dilution ratios the standard error per cent is fairly constant 
for any true density between 1 and 100 organisms per ml. Outside these 
limits the standard error begins to rise steeply, except that with the 10- 
fold series, which has 24 samples per dilution, the rise is postponed until 
8 - 200, for reasons given in the previous section. Inside the limits the 
standard error shows a periodic fluctuation which is noticeable with the 
10-fold dilution but negligible for the 2-fold. With a 5-fold dilution 
(not shown), this periodic effect would be just perceptible. It is present 
with the 10-fold series because practically all the information is con- 
tributed by a single dilution. When the true density is about 1.5 or 15 
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or 150, so that one of the dilutions has about 1.5 organisms per sample, 
there is a trough, with peaks in the intervening densities where no sample 
has a density close to this value. With the 2-fold series, several dilutions 
contribute information and the periodic effect is smoothed out. On the 
whole, the 2-fold dilution gives a slightly lower standard error over the 
range from 1 to 100 organisms per ml., the difference being about 7 per 
cent. For these reasons a low dilution ratio is preferable if the extra 
work involved can be accomplished easily. 




«.» i.w 0.0 ig.o ao.O 100.0 

Trm Otndty (Organiimt pir ml.) 

FIGURE I. COMPARISON OF DILUTION RATIOS 2 AND 10 



The curves in Figure 1 were calculated by assuming that the formula 
which holds for the standard error in the limiting distribution, appropri- 
ate for very large samples, could be applied to this example in which the 
total number of samples is 72. Some unpublished work by Dr. I. J. 
Bross on the distribution of the m.p.n. in small samples indicates that the 
standard errors are higher than those obtained in this way from the 
limiting distribution. Further, the periodicity with the 10-fold dilution 
does not follow the course predicted for it. However, the two principal 
conclusions from Figure 1 still appear to hold in small samples, namely 
that the standard error is more stable with a low dilution ratio, and also 
tends to be slightly lower.* 

♦This work was carried out under contract with the Office of Naval Research. 
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STANDARD ERROlt OF THE M.P.N. 



In many types of investigation there may be only a few samples for 
each dilution. In this event the distribution of the estimated density d 
is very skew, and to attach a standard error to d is misleading. The 
distribution of log d is more nearly symmetrical, and it is recommended 
that tests of significance and the construction of confidence limits be 
performed from log d rather than from d. If there are n samples per 
dilution (assumed the same in all dilutions), the standard error of logi 0 d 
may be taken as 



where a is the dilution ratio. This formula can be used for any density 
which lies between l/v„ and l/v L , and for any dilution ratio of 5 or less. 
For a dilution ratio of 10, a more conservative factor of 0.58 is preferable 
to 0.55, to allow for the contingency that the estimation may have been 
made at a point where the standard error has one of its peaks. Thus 
for dilution ratio 10 the formula becomes simply 0.58/ \/n. Note that 
the formula does not explicitly involve the number of dilutions used. 

To test the significance of the difference between two estimated 
densities, made from independent series, we compute 



and refer to the normal probability tables. 

The construction of confidence limits may be illustrated by assuming 
that we have three 10-fold dilutions, with 5 samples per dilution. The 
standard error of log d is 0.58/ y/5, or 0.259, so that the 95 per cent 
confidence limits for log d are (log d d= 0.518). It follows that to get the 
upper confidence limit for d, we must multiply d by antilog (0.518) or 
3.3, and to get the lower confidence limit we must divide d by 3.3. 

For the common dilution ratios, 2, 4, 5, and 10, Table I shows the 
standard error of log d for any number of samples per dilution between 
1 and 10. The table also gives the factor by which the estimated density 
must be multiplied and divided in order to obtain upper and lower 95 
per cent confidence limits respectively. In the example presented by 
Fisher and Yates (6), the number of rope spore organisms per gram of 
potato flour was estimated to be 760. The dilution ratio was 2 and there 
were 5 tubes per dilution. From Table I, the factor for n = 5, a = 2 is 
1.86. Hence the upper confidence limit is 760 X 1.86 or 1414, while the 




log d x - log d 2 




ESTIMATION OP BACTERIAL DENSITIES 



115 



TABLE I 

STANDARD ERROR OF LOG d AND FACTOR FOR CONFIDENCE LIMITS 















Factor for 95% 




No. of 










confidence limits 




samples 


















per dil. 


Dilution ratio (a) 


Dilution ratio (a) 


n 


2 


4 


5 


10 


2 


4 


5 


10 


1 


.301 


.427 


.460 


.580 


4.00 


7.14 


8.32 


14.45 


2 


.213 


.302 


.325 


.410 


2.67 


4.00 


4.47 


6.61 


3 


.174 


.246 


.265 


.335 


2.23 


3.10 


3.39 


4.68 


4 


.150 


.214 


.230 


.290 


2.00 


2.68 


2.88 


3.80 


5 


.135 


.191 


.206 


.259 


1.86 


2.41 


2.58 


3.30 


6 


.123 


.174 


.188 


.237 


1.76 


2.23 


2.38 


2.98 


7 


.114 


.161 


.174 


.219 


1.69 


2.10 


2.23 


2.74 


8 


.107 


.151 


.163 


.205 


1.64 


2.00 


2.12 


2.57 


9 


.100 


.142 


.153 


.193 


1.58 


1.92 


2.02 


2.43 


10 


.095 


.135 


.145 


.183 


1.55 


1.86 


1.95 


2.32 



lower limit is 760/1.86 or 409. This factor clearly fulfills the same 
general purpose as would a standard error, if it had been appropriate to 
attach one to d. 

The table makes it evident that the dilution method is of low pre- 
cision, as is to be expected from a method that does not use direct 
counts. Large numbers of samples must be taken at each dilution if a 
really precise result is wanted. Further, the table is likely to over- 
estimate the accuracy of the method, since it is derived on the assumption 
that the mathematical analysis corresponds exactly to the practical 
situation. With a large volume of liquid that cannot be mixed, the 
distribution of organisms may be far from homogeneous. The method 
will determine the density in that part of the liquid from which the 
initial sample was taken. This might be very different from the average 
density over the whole liquid, and this source of error could be more 
important than the error in the dilution method itself. 

SUMMARY OF STEPS IN PLANNING 

The decisions to be made involve a choice of the dilution ratio, a, 
the number of dilutions and the actual sample volume in each dilution, 
and finally the number of samples n to be used at each dilution. The 
steps may be set out as follows. 

1. Decide on the limits S L and S B within which the true density 
appears certain to lie. 
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2. Calculate the lowest and highest sample volumes by means of 
the relations 



3. Select a dilution ratio. A low ratio is preferable whenever 
feasible. 

4. The number of dilutions and the actual volumes for each dilution 
may now be chosen so as to satisfy the requirements that the highest 
sample volume must not be less than v„ and the lowest must not 
exceed v L . 

5. The precision to be expected for any specified number n of samples 
per dilution may be appraised from Table I, if the number of samples 
per dilution is less than 10, or from the formula for S.E. (loii) . Choose 
the number of samples in the light of the precision that is desirable 
and the amount of work that it is practicable to do. 
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